Automatically Extracting Web API Specifications from HTML Documentation

نویسندگان

  • Jinqiu Yang
  • Erik Wittern
  • Annie T. T. Ying
  • Julian Dolby
  • Lin Tan
چکیده

Web API specifications are machine-readable descriptions of APIs. These specifications, in combination with related tooling, simplify and support the consumption of APIs. However, despite the increased distribution of web APIs, specifications are rare and their creation and maintenance heavily relies on manual efforts by third parties. In this paper, we propose an automatic approach and an associated tool called D2Spec for extracting specifications from web API documentation pages. Given a seed online documentation page on an API, D2Spec first crawls all documentation pages on the API, and then uses a set of machine-learning techniques to extract the base URL, path templates, and HTTP methods – collectively describing the endpoints of an API. We evaluated whether D2Spec can accurately extract endpoints from documentation on 120 web APIs. The results showed that D2Spec achieved a precision of 87.5% in identifying base URLs, a precision of 81.3% and a recall of 80.6% in generating path templates, and a precision of 84.4% and a recall of 76.2% in extracting HTTP methods. In addition, we found that D2Spec was useful when applied to APIs with pre-existing API specifications: D2Spec revealed many inconsistencies between web API documentation and their corresponding publicly available specifications. Thus, D2Spec can be used by web API providers to keep documentation and specifications in synchronization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Design of Distributed Hyperlinked Programming Documentation

HotJava is a World-Wide Web browser that adds dynamic behavior to hypertext access by supporting the downloading and execution of architecture-neutral, interactive applets from inside a Web page. HotJava is written in Java, a new object-oriented language and environment developed at Sun Microsystems. This paper describes the design of the documentation for Java's application programming interfa...

متن کامل

Automated Information Extraction from Web APIs Documentation

A fundamental characteristic of Web APIs is the fact that, de facto, providers hardly follow any standard practices while implementing, publishing, and documenting their APIs. As a consequence, the discovery and use of these services by third parties is significantly hampered. In order to achieve further automation while exploiting Web APIs we present an approach for automatically extracting re...

متن کامل

Towards open services on the Web : a semantic approach

Knowledge Media Institute (KMi) Doctor of Philosophy in Computer Science by Dipl.-Inform. Maria Maleshkova The World Wide Web (WWW) has significantly evolved since it was first released as a publicly available service on the Internet, developing from a collection of a few interlinked static pages to a global ubiquitous platform for sharing, searching and browsing dynamic and customisable conten...

متن کامل

Information Discovery, Extraction and Integration for the Hidden Web

In this paper, we report our initial investigations on the problems of automatically extracting data objects from a given hidden-web source (i.e., the web site with an HTML search form) and automatically assigning semantics to the extracted data. We also propose some future work to address the problem of information discovery and integration for hidden-web sources.

متن کامل

Extracting Content Structure for Web Pages Based on Visual Representation

A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.08928  شماره 

صفحات  -

تاریخ انتشار 2018